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ABSTRACT 

A linear classification rule (used uith egual 
covariance matrix es) was contrasted with a quadratic rule (used with 
unequal covariance matrices) for accuracy of internal and external 
classification. The comparisons were made for seven situations which 
resulted from combining three data conditions (equal and unequal 
covariance matrices, minimal and nonminimal group centroid 
separation, and tiro and three criterion groups) for different sets of 
data. For the internal analysis the quadratic rule was superior in 
all seven situations. For the external analysis the linear rule was 
superior in nearly all of the situations. (Author) 
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Linear Versus Quadratic 
Multivariate Classification 



Introduction 

Multivariate classification may be considered as one aspect of discrim- 
inant analysis — other aspects being separation, discrimination, and es- 
timation. A classification analysis is primarily applicable to the follow- 
ing problem: Given p measures associated with an individual (or object), 
can we predict the one of K well-defined and exhaustive populations to which 
this individual most likely belongs? Classification serves other potentially 
useful purposes as well. For example, the proportion of correct classifica- 
tions (or assignments) may be used as in index of discriminatory power of a 
set of predictors. Results of a classification analysis may also be used 
for assessing the relative contribution of the predictors to criterion pop- 
ulation separation. 

Various multivariate classification rules have been proposed. Although 
some nonparametric rules have been advanced, most research dealing with the 
study and application of rules has Involved those rules that are parametric 
in nature. In particular, rules based on multivariate normal distribution 
theory have been the most popular. One criterion for selecting a class of 
appropriate rules from those available is the similarity of covarlauce struc- 
ture of the predictors across the K criterion populations. If it can be as- 
sumed, or if the sample data suggest, that the covariance structure is the same, 
a '^linear'' rule is selected; if not, a nonlinear rule would be the choice. I£ 
it is decided that a nonlinear rule would be appropriate, the choice has typ- 
ically been a ^^quadratic" rule. (See Huberty, in press, for elaboration.) 
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The equal covariance structure condition has typically been ignored in 
applications of multivariate classification, with one or another linear rule 
being employed. The question then arises as to whether or not some predictive 
accuracy has been lost when a linear (or quadratic) rule rather than a quadratic 
(or linear) rule has been used. The purpose of the present investigation was 
to compare the accuracy of a linear classification rule with that of a 
quadratid rule. The comparison was made under the conditions considered 
appropriate; for the use of each type of rule. 

Data 

Three data conditions were considered in combination to yield eight 
"situations.'' The first condition deals with the equality or inequality of 
the predictor variable population covariance matrices; this condition was 
assessed via a test proposed by G.E.P. Box, which is a generalization of the 
Bartlett test for the homogeneity of K univariate variances (see Cooley and 
Lohnes, 1971, p. 229). The second condition is that of the degree of sep- 
eration of the K population centroids (or mean vectors) , as assessed by 
Wilks's lambda criterion (see Cooley and Lohnes, 1971, p. 226). (The Wilks's 
criterion was employed recognizing that its appropriateness depends, strictly, 
upon the condition of equal covariance matrices) . The third condition is 
the number of criterion groups studied. 

As mentionea above, when there is insufficient evidence to conclude 
that the covariance matrices are unequal, a linear rule is generally con- 
sidered appropriate. Under this condition, a linear rule was contrasted 
with a quadratic rule for minimal and nor.minimal centroid separation for two 
and three criterion groups — four situations resulting, \7hen the data sug- 
gested that the covariance matrices were unequal, the two rules were again 
contrasted for the four situations. 



Three data sets were employed for the comparisons. Within Set A the 
subjects are public school reading teachers: the 10 predictor measures used 
are measures of knowledge of reading and of teacher background; the criter- 
tion groups are defined by method of reading instruction employed. Data 
Set B is based on college freshmen: measures on high school academic per- 
formance^ standardized tests of French achievement, and nationally normed 
tests for college bound students provide scores on 13 predictors; the cri- 
terion groups are defined by instructor Judgment of student placement in ^ 
college French classes. The subjects of data Set C are high school students: 
the 17 predictor measures are cognitive, interest, personality, and socioeco- 
nomic status measures; criterion groups are based upon post-secondary educa- 
tional placement. To provide data that indicated equal covariance struc- 
tures) complete groups were deleted from each data set, retaining unequal 
group sizes. A situation with three criterion groups that are minimally separated, 
for which a linear classification rule would be appropriate, was not inves- 
tigated, since data for such a situation were unavailable. Thus, seven of 
eight possible data situations were considered. 

Data Analysis 

The linear classification rule used in this study is based on a Bayesian 
conditional-probability model assuming multivariate normality within each 
criterion population, and constant covariance structure across the criter- 
ion populations. The classification statistic is a function of sainple mean 
vectors and the wi thin-groups, covariance matrix. Defining 



to "be square of the distance from the point in p-space representing in- 
dividual i(X^) to the point representing the means of the p measures in group 
k (^) » where S is the pooled sample (pxp) covariance matrix, thQ following 
classificaticn statistic was used: 



^k ^^(-'-^^ik ) 



K 



2 V^, expi-h\^, ) 

—1 



where p^ is the prior probability of membership in population k. This lat- 
ter expression represents the (posterior) probability of individual i belong- 
ing to population k. An individual is classified into that population from 
which the sample yields the largest value of P., . The value of p, used in 
this study is N^^/N, where is the size of the sample selected from popula- 
tion k, and N = SN, . 

k ^ 

The quadratic classification rule used is similar to the linear rule ex- 
cept that the sample covariance matrix for each group (S ) is Uoed in place 
of S, with the determinants of the matrices incorporated (see Cooley & 
Lohnes, 1971* p. 268). 

In comparing the accuracy of prediction of the linear rule to that of 
the quadratic rule, both "internal" and "external" classification results 
vere considered. Results of an internal classification analysis are those 
obtained when measures for the individuals on whom the statistics (^ and 
S or S^) were based are resubstituted to obtain the P^^ values. In an 
external classification analysis statistics based on one set of individuals 
are used in classifying "new" individuals. The erctemal classiirication me- 
thod used in this study is an extension of that suggested by Laohenbruch 
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(1967). The procedure for the Lachenbruch method Is as follows: Compute 

the statistics for each of the possible total samples of size EN, - 1 ob- 

k ^ 

tained by omitting one individual's vector from the original total sample, 
and record for ea a computation whether the omitted individual is misclasr* 
sified. In calculating the P^j^ values for both the linear and quadratic 
rules, matrix inversions are requried, but the labor can be reduced to mere- 
ly adjusting the inverses based on all EN, individuals. Expressiors''" for 

k 

-1 -1 

the adjustments of S > , and the mean vectors are given by Eisenbeis , 
and Avery (1972, p. 100). 

Separate group as well as total group proportions of correct classifi- 
cations were compared for the linear and quadratic rules; McNemar's chi- 

square statistic was used in the statistical comparisons of the total sam~ 

2 

pie proportions. Measures of distances (Mahalanobis D with modifica- 
tions for unequal covariance matrices) in multivariate spaces betv/een 
pairs of group mean vectors were examined to determine group proximity. 
An *'arrant mis classification" is defined as one that occurs when if an 
individual is misclassif ied, he is classified into a population other than 
one "closest" to his actual population. The two rules were compared in 
terms of the number of arrant misclassif ications for both the internal 
and external analyses. 

Results 

Means, standard deviations, univariate ANOVA mean-square ratios, and 
within-groups intercorrelations of the predictors were determined for each 
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situation. Tables of such values are -available upon request. 

The seven data situations Investigated were characterized by (a) num- 
ber of criterion groups, (b) group separation, and (c) the appropriate. 
In terms of co variance structure, classification rule (see Table 1) . A 
situation with minimal separation was arbitrarily defined to be one for which 
A >.80; for nonmlnlmal separation, A £ .80. Thus, In situations I, IV, and V the 
groups are minimally separated. If the F statistic used to test the equal- 
ity of the population covariance matrices yielded significance (p < .05) a 
quadratic rule was judged appropriate, otherwise a linear rule was considered appro 
prlate. 

Insert Table 1 about here 

The results of the internal and external classification analysis for 
the linear and quadratic rules are reported In Tables 2 through 8. The ex- 
pected proportions given In the tables are based on the marginal sums for 
each classification matrix. The groups are listed In the "order" deter- 
mined by the multivariate distance measures. Various results are clear 
from the tables. First, consider a comparison of the linear and the quad- 
ratic rules for the Internal analysis. The proportion of correct classifi- 
cations across all criterion groups is significantly higher for the quadra- 
tic rule than for the linear rule In all situations — the smallest value of 
McNemar's chi-square statistic was 5«76 with p<.025. And with two excep- 
tions the quadratic rule outperforms the linear rule In terms of proportions 
of correct classifications for separate groups. One exception Is for situa- 
tion V (see Table 6) where the proportion with the linear rule for group 1 
(53/65 « 0.82) Is slightly higher than that with the quadratic rule for 
group 1 (51/65 « 0.78) — note that group 1 is the largest group. The other 
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exception is for situation VII (see Table 8) where the group 3 proportion 
with the linear rule (161/200 « 0.805) is about the same as that with the 
quadratic rule (159/200 » 0,795); identical proportions resulted for group 
1 — note again that groups 3 and 1 are the largest groups. The number of 
arrant misclassif ications (appropriately considered only in situations in 
which three groups were involved) was less with the quadratic rule in situa- 
tion V (see Table 6, 21 versus 31) and in situation VII (see Table 8, 58 ver- 
sus 67) • 

Insert Tables_2-8_about_here 

Second, consider a comparison of the two rules for the external analy- 
sis. For situations I, IV, and V (see Tables 2, 5, and 6) the across-group 
proportions were about what would be expected by chance classification, for 
the given marginal sums; note that for all three of these situations the 
group separation was minimal. For situations II and III (see Tables 3 and 
4) in which the linear rule was judged appropriate and separation was nonmin- 
imal, the linear rule did better with the difference being statistically sig- 
nificant (p<.05) for situation III. The linear rule also gave better results 
(.648 versus .604, p<.05) for situation VII (see Table 8), where the quadra- 
tic rule was appropriate and separation was nonminimal. For situation VI 
(see Table 7) where the quadratic rule was appropriate and separation was 
noxuQinimal, the quadratic rule was clearly better (p<.001). For all situations 
but one, the proportion of correct classifications for the largest group was 
highest with the linear rule; the exception was situation VII where the quad- 
ratic rule yielded 87.6% correct classifications while the linear rule yielded 
84.5%. The linear rule also yielded fewer arrant misclassif ications for situ- 
uation V (see Table 6> 21 versus 37), while the numbers were identical for sit- 
uation III (see Table 4) > and nearly the same for situation VII (see Table 8). 

Third, consider a comparison of the internal analysis and the external 

ERIC I 



8 

analysis. As to "be expected, the internal analysis yielded higher propor- 
tions of correct classifications than the external analysis in all situa- 
tions save one for "both rules. The lone exception was for situation VI 
(see Tatle 7) where the proportions were identical with the quadratic rule. 

Discussion 

If, in a study calling for a multivariate classification analysis, in- 
terest is primarily on obtaining a high proportion of correct classifica- 
tions in an "internal" sense, then a quadratic rule should always "be used 
in preference to a linear rule. With this concern the quadratic rule - 
would "be used regardless of the covariance structure of the data. However, 
if the concern is for high classification accuracy for a nev data set (i.e., 
"external" classification), then, "based on the results of the ciirrent in- 
vestigation, a quadratic rule should not always "be used. It was found that 
the linear rule yielded a higher across-group proportion of correct classi- 
fications for an external anedysis for two situations involving three cri- 
' terion groups that have nonminimal separation. That a linear r;ile did "bet- 
ter than a quadratic rule in an external aense is presximatly due to the 
fact that fewer parameters need "be estimated with the linerjj: rule. It 

is conjectured that the results of an external analysis would "be improved 
if only the ""better" predictors were used in the analysis. (Thi& conjecture 
was supported "by the results of an external analysis of the date, of situa- 
tion VII with only nine of the predictor measures used. The results are 
given in Tatle 9-) With regard to sepctrate group classification accuracy. 

Insert Table 9 about here 

based on the results of this study, it might be recommended tha'b a linear 
rule be used when interest is mainly on getting high acc'-'racy for the largest 
criterion group. 

iu 



Whereas proportions of correct classifications obtained from an inter- 
nal classification analysis are known to constantly overestixaate the true 
proportions (i.e., probabilities) » external classification gives an under- 
estimation. The difference between proportions yielded by the two analy- 
ses indicates the interval in which the "optimal probability" can be ex- 
pected to lie. If there is a great difference between the two propor- 
tions, one can expect to achieve better classification of new samples by 
increasing sample sizes (Michaelis, 1973, p, 233). 

The present investigation represents only a beginning. More empirical 
investigations are needed in the study of linear versus quadratic classi- 
fication, using both internal and exte|^al analyses. Perhaps some Monte 
Carlo studies are called for, taking into consideration such factors as 
covariance structure, number of predictors, sample sizes, group separation, 
and predictor inter con elations , to list a few. 
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Footnote 

^he Eisenbeis and Avery expressio. s for the adjustments of S"^ and of 
are in error. In each, the first sign within the "brackets should 
be plus rather than minus . 



Tatle 1 
Description of the Seven 
Data Situations 'Investigated 



Situation Numter of Numter of Sample Wilks^s F- and df -values Appropriate 
Groups Variables Sizes Lambda for Equality of Rule 

Covariance Matri- 
ces 



I 


2 


10 


65 M 


.9h6k 


1.096; clf:55, 


Linear 


II 


2 


13 ' 


35,81 


.3583 


1.01+3; (if:91, » 


Linear 


III 


3 


13 


• 35,81,37 


.2313 


1.152; df:l82,'» 


Linear 


IV 


2 


10 


65,1+0 


.8923 


1.589; df:55, " . 


Quadratic 


V 


3 


10 


'■. 65, U7,U0 


.9119 


1.278; dfillO,* 


Quadratic 


VI 


2 


17 


26,200 


.7672 


1.1+31; df:153," 


Quadratic 


VII 


3 


17. 


177,75,200 


-.5509 


1.650; df:306," 


Quadratic 



a 

df value greater than 10,000. 



(Table 2 

Frequencies of Classifications 
for Situation I 
(Equal Covariances, Minimal Separation, Two Groups) 



Internal Classification 



External Classification 



Linear 



Linear 



Classified Group 


Classified Group 




1 2 


Total 


1 


2 . 


Total 


Ac-bual 1 56 9 


65 


Actual 1 hi 


18 




Group 2 31 l6 


hi 


Group 2 . 39 


8 


hi 






















Quadratic 




Quadratic 




Classified Group 


Classified Group 




1 2 


Total 


1 


2 


Total 


Actual 1 57 8 


65 


Actual 1 h3 


22 


65 


Group 2 19 28 


hi 


Group 2 3^^ 


13 


hi 


^0 = •'^59 




P = .500 
















= .529 




Pe = .531 







- observed proportion of correct classifications acrosfs all groups. 
~ expected proportion of correct classifications across all groups. 



Table 3 

Frequencies of Classifications 
for Situation II 
(Equal Covariances, Nonminimal Separation, Two Groups) 



Internal Classification 
Linear 

Classified Group 



External Classification 
Linear 
Classified Group 



1 


2 Total 




1 


2 


Total 


Actual 1 30 


5 35 


AC uuaj. ± 


29 


6 


35 


Group 2 5 


76 81 


u-roup d. 


7 


71* 


Ql. 


P = .91!+ 
0 

P^ = .579 

e 




p 

0 

r 

e 


= .888 
= .575 






Quadratic 






Quadratic 




Classified Group 




Classified Group 


1 


2 Total 




1 


2 


Total 


Actual 1 33 


2 35 


Actual 1 


23 


12 


35 


Group 2 1 


80 81 


Group 2 


7 


71+ 


81 


Pq = .97J+ 

P3 = .582 




P 

0 

P 

e 


= .836 
= .596 







Table k 

Frequencies of Classifications 
for Situation III 
(Equal Covariances , Nonminiiaal Separation, Three Groups) 
Internal Classification External Classification 

Linear Linear 







Classified 


Group 








Classified Group 








1 


2 


3 


Total 






1 


2 


3 


Total 


Actual 


1 


30 


5 


0 


35 


Actual 


1 


29 


6 


0 . 


35 


Group 


2 


7 


71 


3 


8l' 


Group 


2 


7 


70 


k 


81 




3 


0 . 


5 


32 . 


37 




3 


0 


8 


29 


37 




P 

o 


= .869 










P 

0 




.837 








P 

e 


= .391 










P 

e 




.397 







Quadratic ' 

Classified Group Classified Group 







1 


2 


3 ■ 


Total 






1 


2 


3 


■ Total 


Actual 


1 


33 


2 . 


0 


35 


Actual 


1 


23 


12 


0 


35 


Group 


2 


1 


77 


3 


81 


Group 


2 


7 


68 


6 


81 




3 


0 . 


3 


3!+ . 


37 ' 




3 


0 


12 


25 


37 




P 

o 


= .9hl 










P 

0 




.758 








P 

e 


= .393 










P 

e 




.1+11 
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Table 5 

Frequencies of Clsissifications 
for Situation IV 
(Unequal Covariances, Minimal Separation, Two Groups) 



Internal Classification 
Linear 

Classified Group 
1 2 Total 
Actual 1 55 10 65 
Group 2 2U l6 ■ ho. . 

P = .676 

o 

P = .560 
e 



External Classification 
Linear 

Classified Group 
1 2 Total 



Actual 1 
Group 2 



P = 
o 

P = 



52 

27 

.619 
.560 



13 
13 



65 
ho 



Quadratic 

Classified Group 

1 2 Total 

Actual 1 56 $ 65 

Group 2 ik 26 ho 

P = .781 
o 

P = .5J+0 

e 



Actual 1 
Group 2 



Quadratic 
Classified Group 
1 2 Total 



P = 
o 

P. = 



1+1+ 

26. 

.552 
.51+0 



21 
1I+ 



65 
1+0 
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Table 6 

Frequencies of Classifications 
for Situation V 
(Unequal Covariances, Minimal Separation, Three Groups) 



Internal Classification 





Linear 












Classified Group 








1 


2 


3 


Total 


Actual 


1 


53 


h 


8 


65 


Group 


2 


28 


11 


8 


hi 




3 


23 


k 


13 


ho 



P„ = .507 
o 

P = .382 
e 



External Classification 



Linear 



Group 2 
3 



Classified 


Group 




1 2 


o 


Total 


hh 13 


'8 


65 


36 1 


10 


hi 


23 11 


6 


ho 


= .336 







P = .382 
e 



Quadratic 

Classified Group 

12 3 Total 

Actual 1 51 6 8 65 

Group 2 l6 25. 6 hi 

3 13 6 21 UO . 

P = .638 
o 

P = .361 

e 



Quadratic 

Classified Group 

1 2 3 Total 

Actual 1 33 15 IT 65 
Group 2 28 8 11 U7 
3 20 Ih 6 ho 



P^ = .309 
P° = .362 
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Table 7 

Frequencies of Classifications 
for Situation VI 
(Unequal Covariances, Nomninimal Separation, Two Groups) 



Internal Classification 
Linear 

Classified Grow 

1 2 Total 

Actual 1 11 15 26 

Group 2 8 192 200 

P = .898 
o 

P = .820 



External Classification 
Linear 

Classified Group 





1 


2 


Total 


Actual 1 


5 


.21 


26 


Group 2 


9 


191 


200 


P = 

0 

P = 


.867 
.838 







Quadratic 

Classified Group 
1 2 Total 

Actual 1 26 0 26 
Group 2 2 . 198,' 200. . 



P„ = .991 
o 



Pg = .790 



Quadratic 

Classified Group 
1 2 . Total 

Actual 1 26 0 26 

Group 2 2 198 200 



P = .991 
o 



Pg = .790 
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Table 8 

Frequencies of Classifications 
for Situation VII 
(Unequal Covariances, Konminimal Separation, Three Groups) 



Internal Classification 
Linear 

Classified Group 



External Classification 
Linear 

Classified Group 







1 


2 


3 


Total 






1 2 


3 


Total 


Actual 


1 


137 


10 


30 


177 


Actual 


1 


129 12 


36 


177 


Group 


2 


ko 


13 


22 


75 


Group 


2 


1+1+ 9 


22 


75 




3 


37 


2 


l6l 


200 . 




3 


1+1. 1+ 


155 


200 




P 

0 

P 

e 


= .688 
= .1+03 










P 

0 


= .61+8 

= Jm 







Quadratic 

Classified Group 

12 3 Total 

Actual 1 137 10 30 177 

Group 2 ll+ 1+8 13 75 

3 28 13 159 200 

P = .761 
o 

= .379 

e 



Quadratic 

Classified Group 
1 2 -3 .Total 
Actual 1 115 2k 38 177 
Group 2 35 16 2I+ 75 
3 37 21 1I+2 200 



P = .60l+ 
o 

Pg = .381+ 
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Table 9 

Frequencies of Classifications 
Using Wine Measures of Situation VII 
(Unequal Covariances, Nonminimal Separation, Three Groups) 



Internal Classification 
Linear 

Classified Group 
12 3 Total 
Actual 1 135 h 38 177 
Group 2 39 11 25 75 
3 38 3 159 200 
= .675 

o 

P = .1+08 
e 



Quadratic 

Classified Group 

12 3 Total 

Actual 1 13h 9 3J+ 177 

Group 2 31 21+ 20 75 

3 35 7 15G 200 

= .699 

P^ = .395 
e 



External Classification 
Linear 

Classified Group 
12 3 Total 
Actual 1 132 7 38 177 
Group 2 1+2 7 26 75 
3 1+1 1+ 155 200 



P = .650 
o 

P = .1+OT 
e 



Quadratic 

Classified Group 
12 3 Total 
Actual 1 121 15 hi 177 
Group 2 1+0 12 23 75 
3 39 10 151 200 

P = .628 

o 

P. = -397 
e 



